Mixing properties of growing networks and the Simpson's paradox 
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We analyze the mixing properties of growing networks and find that, in some cases, the assorta- 
tivity patterns are reversed once links' direction is considered: the disassortative behavior observed 
in such networks is a spurious effect, and a careful analysis reveals genuine positive correlations. 
We prove our claim by analytical calculations and numerical simulations for two classes of models 
based on preferential attachment and fitness. Such counterintuitive phenomenon is a manifestation 
of the well known Simpson's paradox. Results concerning mixing patterns may have important 
consequences, since they reflect on structural properties as resilience, epidemic spreading and syn- 
chronization. Our findings suggest that a more detailed analysis of real directed networks, such as 
the World Wide Web, is needed. 
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Complex networks arise in a wide range of interact- 
ing structures^ including social, technological and biolog- 
ical systems Although all these networks share some 
generic statistical features, such as the small world prop- 
erty and the scale-invariance of the degree distribution, 
they also display differences and peculiarities when their 
structure is examined in detail. 

A distinctive characteristic of a network is whether its 
nodes tend to connect to similar or unlike peers, the 
so-called mixing property Similarity of nodes is 

established by comparing some node-dependent scalar 
quantity measuring a given quality. Borrowing terms 
from sociology, networks where properties of neighbor- 
ing nodes are positively correlated are called assortative, 
while those showing negative correlations are called dis- 
assortative. Thus, assortative and disassortative mixing 
patterns indicate a generic tendency to connect respec- 
tively to similar or dissimilar pears. A scalar quantity 
naturally associated to each node in a network is its de- 
gree, measuring the number of neighboring nodes. The 
mixing by degree (MbD) is often measured by looking at 
how the average degree K nn of the nearest neighbors of a 
node depends on the degree K of the node itself, and is a 
signature of correlations between other networks quanti- 
ties 0. The mixing is assortative when K nn grows with 
K and disassortative when it decreases @ . The relevance 
of MbD lies in that, beyond discriminating among differ- 
ent network morphologies |fj , it reflects important struc- 
tural properties. Assortative networks are found to be 
more resilient against the removal of vertexes than disas- 
sortative ones [?J. This implies, for example, that, when 
trying to block infection or opinion spreading within a 
social network 0, , or to protect a computer network 
against cyber-attacks , different strategies are needed 
depending on the MbD properties of the underlying net- 
work. Moreover, it has recently been observed that the 
sign of degree correlations affect other properties of com- 
plex networks such as synchronization |ll| . 

Recent studies show that social networks exhibit assor- 



tative MbD, whereas technological and biological ones 
display disassortative MbD 12]. The Word Wide Web 
(WWW), a paradigmatic example of world-wide collab- 
orative effort among millions of users and publishers, rep- 
resents an anomaly: one would expect it to show assor- 
tative mixing, similarly to other social and collaborative 
networks, while it shows evidences of anticorrelations 0, 
and disassortative MbD [l3], which would rather put it 
in the realm of technological networks. 

We aim to show that, in networks where a direction 
is naturally associated to the links, like in growing net- 
works, it is crucial to distinguish between nearest neigh- 
bors along incoming and outgoing links. In the WWW 
case, for example, links with different direction have dif- 
ferent roles and meanings: the outgoing links are drawn 
by individual web-masters, while they have no control on 
incoming links. In the language of Kleinberg a page 
gains authority from incoming links, while it increases 
a peer's authority by pointing to it. Nevertheless, the 
WWW has been often analyzed and modeled as an undi- 
rected network for what concerns its mixing properties 

urn 

Our main result is that, in most cases, assortativity 
patterns are reversed when the direction of links in a net- 
work is taken into account: positive correlations among 
the degree of a node and the average degree of both up- 
stream and downstream neighbors, considered separately, 
can disappear or even reverse when the different nature 
of neighboring sites is ignored and their degree are aver- 
aged together. Though this result may appear counter- 
intuitive, the fact that pooling together data of different 
nature can generate spurious correlations is well known in 
the statistical literature, and often encountered in social 
sciences, medical statistics and finance, where, although 
it contains no logical contradiction, it is known as Simp- 
son's paradox 15]. 

We show our result on two classes of complex grow- 
ing networks: the linear preferential attachment (LPA) 
model [l^, and the Bianconi-Barabasi (BB) fitness 
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model |l9j. Both include as a special case the Barabasi- 
Albert (BA) model [2(j- In such growing network models, 
links have a natural direction - from newly added nodes 
to existing ones. Thus, in the following wc distinguish be- 
tween upstream and downstream neighbors, respectively 
along incoming and outgoing links. 

To clarify our argument, wc consider in detail the BA 
model (where calculations are simpler) before moving to 
the LPA and BB models. In the BA model, at each 
time step a node is added and attached to the network 
by m undirected links with preferential attachment. A 
node i (introduced at time i) points to existing nodes j 
with probability pj(i) proportional to their degree Kj(i) 
at time i [2(j ■ Since m sets a natural scale for the sys- 
tem, we will express all quantities in units of m, and 
denote them with the superscript ~. On average, the 
degree of node i grows in time as Ki(t) ~ \J~tJi for 
1 < i « i. The average degrees of neighbors of i, in 

=i+i^(i)Pi(i)/(^(i)-l) 




m units, read K n l ™\(t) 
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FIG. 1: Histogram of Knn (squares), K n °n (triangles), as 
functions of K from simulations of the the BA model, with 



and K { ™V{t) = T,^\ K j(t)Pj(}), wh ere K [ ™\(t) and m = 100, t= 10 4 , and averaged over 10 4 realizations.The 



K-nnf CO re f er to the degree of upstream and downstream 
neighbors respectively. By approximating the sum by an 
integral and the degree by its average, one gets K n % ™\(t) ~ 



logV*A/(l - 0/*) and K^Z'it) ~ y/tfilogiAy/i), 
where A is a constant of order one whose exact value 
depends on the initial condition [ljj- At a given time t, 
we can express the above quantities in terms of K and 
drop the i dependence to get Knn — K log K/(K — 1) 
and k { nn t] ^ K\og(K/K), where K =_Ay/t is of or- 
der of (and greater than) the maximum K observable at 
time t, which is K max ~ y/i. Thus Knit 1 is a monoton- 
ically (slowly) increasing function of K, independent on 
t, and Knn^ contains a t dependence through K and 
for any t is an increasing function of K. We conclude 
that the degree of a node is positively correlated both 
with the average degree of upstream and downstream 
neighbors. However, computing the average degree of 
the neighbors altogether, correlations are lost and one 
gets K nn (t) ~ log(Ay/i), independent on K [I]). These 
results are confirmed by numerical simulation of the BA 
model and shown in Fig^ where histograms of Knn\ 
Knn^ , and K nn are plotted as functions of K for t = 10 4 
and m = 100, averaged over 10 4 realizations. 

Let us now focus on the LPA model , a generaliza- 
tion of the BA model: according to the same dynamics, 
at the i-th time step m directed links are drawn from i 
to j with probability Pj(i) oc kj(i) + a, where kj(i) is the 
in-degree of site j at time i. For a — m, the BA model is 
recovered. When dealing with the LPA model, it is con- 
venient to measure quantities in units a. In the contin- 
uum time limit, the time dependence of the in-degree is 
%i(t) = {t/i) 13 -! with = {l + a/m)- 1 . The degrees are 
power-law distributed with exponent 7 = 2 + a/m [l8| . 
The calculation of the average in-degree of upstream and 
downstream neighbors can be performed in analogy to 



behavior of K nn is shown in the inset. The solid lines are the 
analytic calculations. 



the BA model. The average degree of upstream neigh- 
bors reads fci™^ ~ (k + l)log(fc + l)/k — 1, as in the 
BA model since it is independent from the ratio a/m, 
and is monotonically increasing. The average degree of 
downstream neighbors is given by 



fc&**>~(l-/3)(fc + l) 
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which is also an increasing function of the in-degree k. 
contains an explicit dependence on a/m through 
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(3 and on t through k (k > k n 



t 13 ). Note that now 



knn ^ an d knn^ count incoming links only. Instead, when 
ignoring the direction of links by averaging the degree 
over all nodes' neighbors, one gets 
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Two different regimes appear, for a/m < 1 (/3 > 1/2) 
and a/m > 1 (/3 < 1/2), separated by a — m where the 
LPA model coincides with the BA model. The average 
in-degree of nearest neighbors increases as a function of k 
for (3 < 1/2, while it decreases for (3 > 1/2 |2J|. The two 
regimes correspond to qualitatively different behaviors of 
the degree distribution: for a/m > 1 the distribution has 
finite variance in the thermodynamic limit (7 > 3), while 
a/m < 1 corresponds to 2 < 7 < 3, with diverging vari- 
ance in the same limit. Summarizing, for the LPA model 
the degree of a node is positively correlated with the av- 
erage degree of both upstream and downstream nearest 
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FIG. 2: Histogram of fc£ 1 ™' (squares), k n °n (triangles), as 
functions of k for t — 10 4 , from simulations of the LPA model 
with m = 100 and a — 5 (/3 > 1/2), and averaged over 10 4 
realizations. The behavior of k nn is shown in the inset. The 
solid lines are the analytic calculations. 
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FIG. 3: Histogram of knn (squares), K n °n (triangles), as 
functions of k for t = 10 4 , from simulations of the LPA model 
with m = 100 and a — 500 (/3 < 1/2), and averaged over 10 4 
realizations. The behavior of k nn is shown in the inset. The 
solid lines are the analytic calculations. 

neighbors. However, the average degree over all nearest 
neighbors increases or decreases for different values of the 
parameter (3. A behavior similar to the (3 > 1/2 case was 
already observed in simulations of a weighted directed 
model for the WWW [2{|. In Fig 3 and 4 we show the 
results of our calculation, compared with simulation of 
the LPA model for m = 100, t = 10 4 , and a = 5 for the 
(3 > 1/2 regime (Fig. 3), and m = 100, t = 10 4 , and 
a = 500 for the (3 < 1/2 regime (Fig. 4). 

Let now turn our attention to the BB model [l9| . 
The BB model was proposed as a realistic model for 



the WWW, and represents a paradigm for disassorta- 
tively mixed networks p|. Here, the preferential at- 
tachment mechanism is modified to embody the intrin- 
sic heterogeneity of nodes. This is done by assigning to 
each node j a quenched random variable, or fitness, rjj. 
The network is grown by adding a node at each time 
step and connecting it to m existing nodes chosen with 
probability proportional to both their degree and fitness 
Pj(i + 1) oc rjj{kj(i) + m). Now kj(i) depends on the 
single history of the network, and on the quenched vari- 
ables {rji}l =1 . However, for any given realization of the 
quenched disorder the degree can be approximated by 
kj(t) ~ m ((i/i) c ~ l)) where c is a constant that de- 
pends on the probability distribution of the fitness [T^| . 
Thus, even though kj(t) is a function of all fitness, it 
essentially depends only on the value of the fitness at 
site j. This approximation is found to be very accurate 
numerically, and we will use it in what follows. Also, 
we approximate pj (t) by replacing the normalization fac- 
to 1, 'Y^\=i r li{ki{i') + m ) with its average value mci |20| . 
In the same notations as above, we will measure quan- 
tities in units of m. The average degree of downstream 

neighbors is given by k° n u ^{i)(t) = Y^l] (^%{i)kj{t)\, 
similarly the average degree of upstream neighbors is 

ifam,Vi) = E*= i+ i (^f^), where brackets 
represent the average over rjj . Using the above approxi- 
mations and computing the averages, one gets an expres- 
sion for these quantities as functions of i and rji . The k 
dependence of k nn is then obtained by selecting couples 
{i,r]i) that give rise to a degree k after t steps, which 
can be sampled numerically. The results for a uniform 
distribution of fitness in [0, 1] are shown in Fig. 01 where 
they are compared with results from direct simulations. 
Also in this case the degree of a node is positively cor- 
related with the average degree of both upstream and 
downstream neighbors. However, as shown by Pastor- 
Satorras et al. yj, the nearest neighbors average degree 
decreases as a function of the degree. 

In summary, we have demonstrated the crucial role of 
link directions in the analysis of mixing patterns in com- 
plex networks, by showing that assortativity patterns are 
often reversed once a network is considered as directed. 
In the growing complex network models we have ana- 
lyzed, we find positive correlations between the degree 
of a node and the average degrees of both upstream and 
downstream nodes, while fictitious correlations emerge 
when the different nature of the nodes is not taken into 
account. This is an example of the Simpson's paradox 
that may occur any time data from different sources are 
pooled together. The correlation that appears in the 
pooled data is spurious: a positive correlation between 
two quantities before pooling results negative after pool- 
ing and vice versa. In the particular case of growing 
networks the degrees of upstream and downstream neigh- 
bors of a node are positive correlated with the degree of 
the node itself, however the correlation with upstream 
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neighbors is much weaker. For increasing degrees, the 
fraction of weakly correlated neighbors increases. The 
overall neighbors' average degree can then decrease as a 
result of the varied proportion, mislcadingly suggesting 
the presence of negative correlations. In the case of BA 
networks, this effect exactly balances that of positive cor- 
relations. 

Our findings suggest the need for more detailed analysis 
of real directed networks, such as the WWW, with a spe- 
cial focus on the direction of links between nodes. The 
counterintuitive properties described above may explain 
the anomalous exclusion of the WWW from the realm 
of social networks based on its observed disassortative 
mixing. 
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FIG. 4: Histogram of K^n = feln + 1 (squares), Kfm = 
knn^ + 1 (triangles), as functions of K = k + 1 for t — 10 4 , 
from simulations of the BB model with m = 10, and averaged 
over 10 4 realizations. The behavior of K„n = knn + 1 is shown 
in the inset. The solid lines are histogram from integration of 

the analytic expression. We thank Miguel-Angel Munoz for useful discussions. 
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